The JMU Reddit data set contains texts from student users discussing topics related to their university and its surrounding area. The data included geographic locations, allowing individuals to connect online dialogue to specific on-campus events, spaces, and locations. This section includes common topics of student life, dorms, dining halls, and campus events, reflecting a positive tone on the university and the community. In comparison to JMU, ODU’s data set includes Reddit tests that go to the campus and focus more on the city of Norfolk and the Hampton Roads region. This section about ODU deals with off-campus issues like housing, parking, crime, and neighborhoods, and by doing so that the tone is more negative.
Together, these two descriptions and data sets highlight a difference in focus from JMU and ODU’s off-campus environments. This leads to the hypothesis, which if campus dialogue emphasizes more safety off campus life, ODU or JMU.
Off-campus issues such as Norfolk neighborhoods, crime, and housing. ODU’s data shows more negative statements associated with off-campus terms relating to factors like Norfolk, Hampton Roads, graffiti, and parking. Meanwhile, JMU's student dialogue focuses heavily on on-campus spaces like the campus buildings, dorms, dining halls, and student life. JMU focuses more on positive statements associated with on-campus terms like the Quad, dorms, and certain campus events.
First, to gather this information, we will start to compare the differences in school focus from on and off campus, depending on the universities of JMU and ODU, by looking at university buildings, dorms, and campus events for JMU, while looking at ODU, we focus more on the city of Norfolk and the surrounding roads and areas. Doing these visualization steps allows you to insert an interactive Voyant visualization comparing your two data sets right into your notebook. This lets you visually analyze patterns like word usage, sentiment, or topic emphasis between the corpora. After embedding the Voyant visualizations it allows the analysis of the two corpora becomes clearer, credible, and engaging. Then we will start to compare the differences in school focus from on and off campus, depending on the universities of JMU and ODU, by looking at university buildings, dorms, and campus events for JMU, while looking at ODU, we focus more on the city of Norfolk and surrounding roads and areas.
After examining both data sets, it appears that ODU’s Reddit corpus emphasizes and focuses more on off-campus life and community concerns. While on the other hand, JMU’s corpus is focused more on internal campus experiences. In the ODU’s dataset, the student dialogue references Norfolk neighborhoods, safety concerns, crimes, parking, and housing difficulties, suggesting that students engage more with topics reflecting the urban environment surrounding their university. Based on the tones of these discussions, it tends to be more negative due to the frustration with the off-campus living conditions surrounding ODU. In comparison to that JMU data set reveals more campus landmarks, dorms, student activities, dining halls, which creates a more positive tone around social and physical spaces of campus life. By doing this, JMU reflects more of a stronger sense of community with on-campus experiences. This comparison between the two universities supports the hypothesis that the students' discussions from ODU are shaped by off-campus challenges, while JMU’s dialogue is more centered around positive campus-centered texts.
ODU’s student dialogue focuses more on off-campus issues such as Norfolk neighborhoods, crime, and housing. ODU’s data shows more negative statements associated with off-campus terms relating to factors like Norfolk, Hampton Roads, graffiti, and parking. Meanwhile, JMU's student dialogue focuses heavily on on-campus spaces like the campus buildings, dorms, dining halls, and student life. JMU focuses more on positive statements associated with on-campus terms like the Quad, dorms, and certain campus events.
We will start to compare the differences in school focus from on and off campus, depending on the universities of JMU and ODU, by looking at university buildings, dorms, and campus events for JMU, while looking at ODU, we focus more on the city of Norfolk and the surrounding roads and areas.
Different Trends around ODU and Norfolk
Words related to ODU
The visualization step helps test our hypothesis and ideas by either confirming the prediction by showing the patterns that support the ideas, or complicating it by showing the unexpected overlaps or contradictions in the two data sets. It also helped our group support our hypothesis by showing on the maps which locations each campus focused on. For instance, ODU focuses more on Norfolk and the surrounding roads and areas, while JMU focuses more on camps.
Part 3: Data Cleaning and Refinement¶
# =============================================================================
# SETUP: Import Libraries and Load Data
# =============================================================================
# This cell sets up all the tools we need for spatial sentiment analysis
# Force reload to pick up any changes to data_cleaning_utils
# This ensures we get the latest version of our custom functions
import importlib
import sys
if 'data_cleaning_utils' in sys.modules:
importlib.reload(sys.modules['data_cleaning_utils'])
# Core data analysis library - like Excel but for Python
import pandas as pd
# Import our custom functions for cleaning and analyzing location data
from data_cleaning_utils import (
clean_institution_dataframe, # Standardizes and cleans location data
get_data_type_summary, # Shows what types of data we have
get_null_value_summary, # Identifies missing data
create_location_counts, # Counts how often places are mentioned
create_location_sentiment, # Calculates average emotions by location
create_time_animation_data, # Prepares data for animated time series
)
# Interactive plotting library - creates maps and charts
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as pyo
# =============================================================================
# CONFIGURE PLOTLY FOR HTML EXPORT
# =============================================================================
# Configure Plotly for optimal HTML export compatibility
# Method 1: Set renderer for HTML export (use 'notebook' for Jupyter environments)
pio.renderers.default = "notebook"
# Method 2: Configure Plotly for offline use (embeds JavaScript in HTML)
pyo.init_notebook_mode(connected=False) # False = fully offline, no external dependencies
# Method 3: Set template for clean HTML appearance
pio.templates.default = "plotly_white"
# Method 4: Configure Plotly to include plotly.js in HTML exports
import plotly
plotly.offline.init_notebook_mode(connected=False)
# Load the cleaned JMU Reddit data (already processed and ready to use)
# This contains: posts, locations, coordinates, sentiment scores, and dates
df_jmu = pd.read_pickle("assets/data/jmu_reddit_geoparsed_clean.pickle")
# =============================================================================
# LOAD YOUR INSTITUTION'S DATA
# =============================================================================
# Replace the group number and institution name with your assigned data
# 📝 TO DO: Update these paths for your group
# Replace "group_6" with your group number (e.g., "group_1", "group_2", etc.)
# Replace "UNC_processed.csv" with your institution's file name
df_institution = pd.read_csv("group_data_packets/group_2/python/odu_processed_clean.csv")
# =============================================================================
# CREATE RAW LOCATION MAP (Before Cleaning)
# =============================================================================
# This shows the "messy" data before we fix location errors
# You'll see why data cleaning is essential!
# STEP 1: Count how many times each place is mentioned
# Group identical place names together and count occurrences
place_counts = df_institution.groupby('place').agg({
'place': 'count', # Count how many times each place appears
'latitude': 'first', # Take the first latitude coordinate for each place
'longitude': 'first', # Take the first longitude coordinate for each place
'place_type': 'first' # Take the first place type classification
}).rename(columns={'place': 'count'}) # Rename the count column for clarity
# STEP 2: Prepare data for mapping
# Reset index makes 'place' a regular column instead of an index
place_counts = place_counts.reset_index()
# Remove any places that don't have valid coordinates (latitude/longitude)
# This prevents errors when trying to plot points on the map
place_counts = place_counts.dropna(subset=['latitude', 'longitude'])
# STEP 3: Create interactive scatter map
# Each dot represents a place, size = how often it's mentioned
fig = px.scatter_map(
place_counts, # Our prepared data
lat='latitude', # Y-coordinate (north-south position)
lon='longitude', # X-coordinate (east-west position)
size='count', # Bigger dots = more mentions
hover_name='place', # Show place name when hovering
hover_data={ # Additional info in hover tooltip
'count': True, # Show mention count
'place_type': True, # Show what type of place it is
'latitude': ':.4f', # Show coordinates with 4 decimal places
'longitude': ':.4f'
},
size_max=25, # Maximum dot size on map
zoom=4, # How zoomed in the map starts (higher = closer)
title='Raw Location Data: Places Mentioned in UNC Reddit Posts',
center=dict(lat=35.5, lon=-80) # Center map on North Carolina for UNC
)
# STEP 4: Customize map appearance
fig.update_layout(
map_style="carto-positron", # Clean, light map style
width=800, # Map width in pixels
height=600, # Map height in pixels
title_font_size=16, # Title text size
title_x=0.5 # Center the title
)
# Configure for HTML export compatibility
fig.show(config={'displayModeBar': True, 'displaylogo': False})
Toponym Misalignment Analysis
When doing this process, one of the biggest toponym misalignments in the ODU dataset comes from the geoparser incorrectly placing locations far outside Virginia when they were clearly intended to reference areas near campus or within Norfolk. For example when looking at ODU’s data some neighborhood names that were incorrectly matched to towns in Texas or locations overseas, street names appeared in other states simply because those names also exist elsewhere, and references like “campus,” “dorms,” or “the quad,” were sometimes misinterpreted as specific geographic places, leading the coordinates to not correspond to ODU at all. This caused the map to display clusters that were not related to ODU’s student discussions or campus environment. These errors lead to the distortion of the spatial layout of the data, which shows the need for more context-aware and related geoparsers and coordinate organization.
Revised Map
When working on fixing the map, the biggest improvements came from correcting the locations that were wrongly assigned to places far beyond Norfolk. During and after looking at the data, the locations referenced ODU’s surrounding environment, such as neighborhoods near campus, local roads, or common student areas, had been incorrectly mapped to unrelated cities, states, or even other countries. The group removed these false positives and redefined the correct locations and coordinates, matching the locations. All while doing this, we noticed that the frequently mentioned areas near ODU involved safety concerns, commuting routes, and off-campus neighborhoods rather than campus buildings themselves. After noticing this, it confirmed more of our hypothesis that ODU’s student concerns are often more off-campus related and negativ,e rather than focusing on the positive on-campus life, like JMU’s discussions and data proved. Overall, the process of cleaning up the map revealed not only the importance of accurate geoparsing but also how strongly ODU students’ posts are shaped by their experiences in the surrounding Norfolk area.
# =============================================================================
# LOAD CLEANED DATA
# =============================================================================
# Load the CSV file you manually cleaned in Google Sheets
# 📝 TO DO: Update these paths for your group
# Replace "group_6" with your group number
# Replace "UNC_processed_clean.csv" with your institution's cleaned file
df_institution_cleaned = pd.read_csv(
"group_data_packets/group_2/python/odu_processed_clean.csv"
)
# =============================================================================
# APPLY DATA CLEANING FUNCTIONS
# =============================================================================
# Use our custom function to standardize the cleaned data
# Apply the cleaning function to standardize data types and handle missing values
# This function ensures all datasets have the same format for consistent analysis
df_institution_cleaned = clean_institution_dataframe(df_institution_cleaned)
# Display first few rows to verify the cleaning worked properly
# This shows the structure and sample content of your cleaned data
df_institution_cleaned.head()
DataFrame cleaned successfully!
| school_name | unique_id | date | sentences | roberta_compound | place | latitude | longitude | revised_place | revised_latitude | revised_longitude | place_type | false_positive | checked_by | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | ODU | ODU_20 | 2024-10-10 19:58:32 | Northern Lights at odu right now | 0.004979 | Northern Lights Hotel | 59.91660 | 30.25000 | Old Dominion University | 36.88530 | -76.30590 | University | True | Alex |
| 1 | ODU | ODU_26 | 2024-10-10 20:29:40 | Been in VA 31 years and this year is the first... | 0.105875 | Virginia | 37.54812 | -77.44675 | Old Dominion University | 36.88530 | -76.30590 | University | False | Alex |
| 2 | ODU | ODU_31 | 2024-10-11 01:14:32 | The CME is moving at a speed estimate of 1200-... | 0.002683 | Earth | 0.00000 | 0.00000 | Earth | 0.00000 | 0.00000 | none | True | Alex |
| 3 | ODU | ODU_33 | 2024-10-10 21:36:30 | Nope, I saw them up here in Manassas, well I s... | -0.036483 | City of Manassas | 38.75095 | -77.47527 | City of Manassas | 38.75095 | -77.47527 | city | False | Alex |
| 4 | ODU | ODU_36 | 2024-10-10 21:38:42 | Isn’t this literally the worst time to be in T... | -0.822557 | Tampa | 27.94752 | -82.45843 | Tampa | 27.94752 | -82.45843 | city | False | Alex |
Map Customization
# =============================================================================
# CREATE CLEANED LOCATION MAP (After Manual Corrections)
# =============================================================================
# This map shows your data AFTER you fixed the location errors
# Compare this to the raw map above to see the improvement!
# STEP 1: Count occurrences using CLEANED/CORRECTED location data
# Now we use 'revised_place' instead of 'place' - these are your corrections!
place_counts = (
df_institution_cleaned.groupby("revised_place") # Group by corrected place names
.agg(
{
"revised_place": "count", # Count mentions of each corrected place
"revised_latitude": "first", # Use corrected latitude coordinates
"revised_longitude": "first", # Use corrected longitude coordinates
"place_type": "first", # Keep place type classification
}
)
.rename(columns={"revised_place": "count"}) # Rename count column for clarity
)
# STEP 2: Prepare data for mapping
place_counts = place_counts.reset_index() # Make 'revised_place' a regular column
# Remove places without valid corrected coordinates
place_counts = place_counts.dropna(subset=["revised_latitude", "revised_longitude"])
# STEP 3: Create the cleaned location map
fig = px.scatter_map(
place_counts,
lat="revised_latitude", # Use corrected Y-coordinates
lon="revised_longitude", # Use corrected X-coordinates
size="count", # Dot size = mention frequency
hover_name="revised_place", # Show corrected place name on hover
hover_data={
"count": True, # Show how many mentions
"place_type": True, # Show place category
"revised_latitude": ":.4f", # Show corrected coordinates
"revised_longitude": ":.4f",
},
size_max=45, # Maximum dot size
title="Places around ODU off campus",
zoom=12, # 📝 TO DO: Adjust zoom level for your region
center=dict(lat=36.8853, lon=-76.3059), # 📝 TO DO: Adjust center coordinates
)
# STEP 4: Customize map appearance
fig.update_layout(
map_style="carto-positron", # Clean, readable map style
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig.show(config={'displayModeBar': True, 'displaylogo': False})
After cleaning the dataset, the spatial patterns related to ODU are now much clearer and more accurate. Removing false positives, like the locations that were incorrectly mapped outside, helped reveal the true geography of ODU-related Reddit discussions. Now that irrelevant or out-of-state locations are removed, a clearer pattern emerges further proving our hypothesis, because the posts tied to actual Norfolk neighborhoods, roads, or areas surrounding campus often carry more negative sentiment, while the few posts connected directly to campus locations appear more neutral or mixed. This improved map and locations reinforce the idea that ODU conversations are focused off-campus and are shaped by environmental factors beyond university grounds. Overall, fixing the geoparsing errors strengthens the reliability of our spatial sentiment analysis and provides a more realistic picture of how ODU students discuss their surrounding environment online and further proves our hypothesis.
JMU Spatial Distribution
# =============================================================================
# JMU SPATIAL DISTRIBUTION MAP
# =============================================================================
# Create a filtered map showing only certain types of places for JMU
# STEP 1: Use custom function to filter and count JMU locations
# This function applies the same filtering to both datasets for fair comparison
JMU_filtered_locations = create_location_counts(
df_jmu, # JMU Reddit data
minimum_count=2, # Only show places mentioned 2+ times
place_type_filter=['State', 'City', 'Country'] # Only these place types
)
# STEP 2: Create colored scatter map
# Each place type gets a different color to show spatial patterns
fig = px.scatter_map(
JMU_filtered_locations,
lat="revised_latitude",
lon="revised_longitude",
size="count", # Dot size = mention frequency
color="place_type", # Different colors for different place types
hover_name="revised_place",
hover_data={
"count": True,
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25,
zoom=13, # 📝 TO DO: Adjust to highlight interesting patterns
title="On campus places at JMU",
center=dict(lat=38.435318880428284, lon=-78.86980844807114), # 📝 TO DO: Center on area of interest
color_discrete_sequence=px.colors.qualitative.Plotly # Categorical color palette
)
# STEP 3: Customize layout
fig.update_layout(
map_style="carto-positron",
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig.show(config={'displayModeBar': True, 'displaylogo': False})
# =============================================================================
# YOUR INSTITUTION'S SPATIAL DISTRIBUTION MAP
# =============================================================================
# Create a comparable map for your institution using identical filtering
# STEP 1: Apply the same filtering to your institution's data
# Using identical parameters ensures fair comparison with JMU
institution_filtered_locations = create_location_counts(
df_institution_cleaned, # Your cleaned institution data
minimum_count=2, # Same minimum as JMU map
place_type_filter=["city","County","road","region", "state"] # Same place types as JMU
)
# STEP 2: Create matching visualization
# Keep all settings the same as JMU map for direct comparison
fig_institution_cleaned = px.scatter_map(
institution_filtered_locations,
lat="revised_latitude",
lon="revised_longitude",
size="count",
color="place_type", # Same color coding as JMU map
hover_name="revised_place",
hover_data={
"count": True,
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25, # Same size scale as JMU
zoom=10, # 📝 TO DO: Adjust for your region
title="ODU focus in Norflok", # 📝 TO DO: Update institution name
center=dict(lat=36.84681, lon=-76.28522), # 📝 TO DO: Center on your region
color_discrete_sequence=px.colors.qualitative.Plotly, # Same colors as JMU
)
# STEP 3: Apply identical layout settings
fig_institution_cleaned.update_layout(
map_style="carto-positron", # Same style as JMU map
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig_institution_cleaned.show(config={'displayModeBar': True, 'displaylogo': False})
Spatial Analysis
One important spatial difference between the ODU and JMU datasets that reinforces our hypothesis is the way each school’s posts cluster geographically. JMU’s posts remain tightly concentrated around on-campus landmarks that include dorms, classroom buildings, and central student areas. This supports our hypothesis by proving that most conversations are centered around daily campus life. In contrast, ODU’s posts are far more dispersed across Norfolk neighborhoods, major roads, and off-campus areas. This highlights our hypothesis based on how much of the discussion centers on issues beyond the physical campus. This dispersal complicates the idea of a unified “campus experience” at ODU and supports our claim that ODU’s Reddit activity emphasizes more negative off-campus concerns, often reflecting broader environmental or community-based frustrations. Ultimately, this spatial contrast between campus talk at JMU and scattered city talk at ODU gives strong evidence that the two schools use Reddit to discuss very different types of spaces and experiences.
Part 5: Sentiment Analysis Comparison¶
In this section, you are going to compare the sentiments by location for each institution. You are going to do so by first customizing the create_location_sentiment() function.
🔧 Function Parameters¶
This takes the same parameters as the create_location_counts() function above:
minimum_count=- Sets the minimum number of times a location has to appear before it appears on the map. Setting this to the default2means that if a location appears once it will not registerplace_type_filter=- This uses theplace_typecolumn to filter out only the types of places you want. The default isNone, but adding a list of places i.e. (place_type_filter=["University", "Building"]) only shows those places.- 💡 Tip: You might consider showing only one type of place if it helps make your argument. For example, if you are investigating school spirit, it makes the most sense to look at Universities and buildings.
- ⚠️ NOTE:
place_type_filteronly works if you tagged places properly in the cleanup process.
💡 Example Usage¶
create_location_sentiment(
df_jmu,
minimum_count=2,
place_type_filter=None # Include all place types
)
🔧 Customization Instructions¶
Visual Optimization:
- Tweak the center and zoom of your map to highlight an important contrast
- Experiment with different divergent color scales to optimize the visuals
- Change:
RdYlGNincolor_continuous_scale="RdYlGn"to something of your choice - Color Reference: https://plotly.com/python/builtin-colorscales/#builtin-diverging-color-scales
- Change:
- Experiment with different map templates to optimize visuals:
- Change:
carto-positroninmap_style="carto-positron" - Options include: 'basic', 'carto-darkmatter', 'carto-darkmatter-nolabels', 'carto-positron', 'carto-positron-nolabels', 'carto-voyager', 'carto-voyager-nolabels', 'dark', 'light', 'open-street-map', 'outdoors', 'satellite', 'satellite-streets', 'streets', 'white-bg'
- Change:
⚠️ Note: Delete this cell for the final version
# =============================================================================
# JMU SENTIMENT ANALYSIS MAP
# =============================================================================
# Shows the EMOTIONAL tone of how JMU students talk about different places
# Red = negative emotions, Green = positive emotions
# STEP 1: Calculate average sentiment scores by location
# This function groups identical locations and averages their sentiment scores
df_jmu_sentiment = create_location_sentiment(
df_jmu, # JMU Reddit data with sentiment scores
minimum_count=2, # Only places mentioned 2+ times (for reliability)
place_type_filter=None # Include all place types for comprehensive view
)
# STEP 2: Create sentiment visualization map
# Color represents emotional tone: Green = positive, Red = negative, Yellow = neutral
fig_sentiment = px.scatter_map(
df_jmu_sentiment,
lat="revised_latitude",
lon="revised_longitude",
size="count", # Larger dots = more mentions (more reliable sentiment)
color="avg_sentiment", # Color intensity = emotional tone
color_continuous_scale="RdYlGn", # Red-Yellow-Green scale (Red=negative, Green=positive)
hover_name="revised_place",
hover_data={
"count": True, # How many posts contributed to this sentiment
"avg_sentiment": ":.3f", # Average sentiment score (3 decimal places)
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25,
zoom=12, # 📝 TO DO: Adjust to focus on interesting patterns
title="Average Sentiment by Location in JMU Reddit Posts",
center=dict(lat=38.435339890447345, lon=-78.86975480389289), # 📝 TO DO: Center on region of interest
)
# STEP 3: Customize layout for sentiment analysis
fig_sentiment.update_layout(
map_style="carto-positron", # Clean background to highlight sentiment colors
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig_sentiment.show(config={'displayModeBar': True, 'displaylogo': False})
# =============================================================================
# YOUR INSTITUTION'S SENTIMENT ANALYSIS MAP
# =============================================================================
# Compare emotional patterns between your institution and JMU
# STEP 1: Calculate sentiment for your institution using identical methods
institution_sentiment = create_location_sentiment(
df_institution_cleaned, # Your cleaned institution data
minimum_count=2, # Same minimum as JMU (ensures fair comparison)
place_type_filter=None # Same filter as JMU (include all place types)
)
# STEP 2: Create matching sentiment visualization
# Use identical settings to JMU map for direct comparison
fig_institution_sentiment = px.scatter_map(
institution_sentiment,
lat="revised_latitude",
lon="revised_longitude",
size="count",
color="avg_sentiment",
color_continuous_scale="RdYlGn", # Same color scale as JMU map
hover_name="revised_place",
hover_data={
"count": True,
"avg_sentiment": ":.3f",
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f",
},
size_max=25, # Same size scale as JMU
zoom=12, # 📝 TO DO: Adjust for your region
title="Average Sentiment by Location in ODU Reddit Posts", # 📝 TO DO: Update institution name
center=dict(lat=36.88576360972598, lon= -76.30590006163607), # 📝 TO DO: Center on your institution's region
)
# STEP 3: Apply identical layout for comparison
fig_institution_sentiment.update_layout(
map_style="carto-positron", # Same background as JMU map
width=800,
height=600,
title_font_size=16,
title_x=0.5
)
# Display with HTML export configuration
fig_institution_sentiment.show(config={'displayModeBar': True, 'displaylogo': False})
Sentiment Comparison Analysis¶
After mapping and comparing place-based sentiments for both JMU and ODU, the results largely confirm our hypothesis while adding a few changes. JMU’s sentiment map shows locations tied directly to campus life, such as dorms, dining halls, and central student areas. This shows and proves that JMU is associated with more positive or neutral emotional scores. This aligns with our hypothesis explaining that JMU’s Reddit activity centers on campus life and tends to express more upbeat or community-focused content. In comparison to JMU, ODU’s discussions map a negative concentration with locations around off-campus, particularly Norfolk neighborhoods, major roads, and areas associated with safety or commuting concerns. This supports our hypothesis by claiming that ODU students negatively discuss off-campus issues more frequently and with more frustration. However, the analysis also reveals a complication, showing that since ODU posts are so geographically scattered, not all off-campus locations carry strong negative sentiment. After examining the stats, some appear neutral because they were not repeated. This suggests that while the overall trend supports our hypothesis, the emotional landscape for ODU is more complex and influenced by a wider range of environments. Overall, the data support our claims and hypothesis and show that physical locations play a major role in shaping how each school’s students talk about their experiences.
Part 6: Time Series Animation Analysis¶
# =============================================================================
# ANIMATED TIME SERIES: SENTIMENT CHANGES OVER TIME
# =============================================================================
# Watch how places accumulate mentions and sentiment changes over time
# This reveals temporal patterns in student discussions
# STEP 1: Prepare animation data with rolling averages
# This function creates monthly frames showing cumulative growth and sentiment trends
institution_animation = create_time_animation_data(
df_institution_cleaned, # Your cleaned institution data
window_months=3, # 3-month rolling average (smooths out noise)
minimum_count=2, # Only places with 2+ total mentions
place_type_filter=None # Include all place types (📝 TO DO: experiment with filtering)
)
# STEP 2: Create animated scatter map
# Each frame represents one month, showing cumulative mentions and current sentiment
fig_animated = px.scatter_map(
institution_animation,
lat="revised_latitude",
lon="revised_longitude",
size="cumulative_count", # Dot size = total mentions up to this point in time
color="rolling_avg_sentiment", # Color = 3-month average sentiment (smoother than daily)
animation_frame="month", # Each frame = one month of data
animation_group="revised_place", # Keep same places connected across frames
hover_name="revised_place",
hover_data={
"cumulative_count": True, # Total mentions so far
"rolling_avg_sentiment": ":.3f", # Smoothed sentiment score
"place_type": True,
"revised_latitude": ":.4f",
"revised_longitude": ":.4f"
},
color_continuous_scale="RdYlGn", # Same sentiment colors as static maps
size_max=30, # Slightly larger max size for animation visibility
zoom=2, # 📝 TO DO: Adjust zoom for your region
title="Institution Reddit Posts: Cumulative Location Mentions & Rolling Average Sentiment Over Time",
center=dict(lat=35.5, lon=-80), # 📝 TO DO: Center on your institution's area
range_color=[-0.5, 0.5] # Fixed color range for consistent comparison across time
)
# STEP 3: Customize animation settings and layout
fig_animated.update_layout(
map_style="carto-positron",
width=800,
height=600,
title_font_size=16,
title_x=0.5,
coloraxis_colorbar=dict( # Customize the sentiment legend
title="Rolling Avg<br>Sentiment",
tickmode="linear",
tick0=-0.5, # Start legend at -0.5 (most negative)
dtick=0.25 # Tick marks every 0.25 points
)
)
# STEP 4: Set animation timing (in milliseconds)
# 📝 TO DO: Experiment with these values for optimal viewing
fig_animated.layout.updatemenus[0].buttons[0].args[1]["frame"]["duration"] = 800 # Time between frames
fig_animated.layout.updatemenus[0].buttons[0].args[1]["transition"]["duration"] = 300 # Transition smoothness
# Display with HTML export configuration
fig_animated.show(config={'displayModeBar': True, 'displaylogo': False})
Time Series Analysis
To optimize the time-series visualization, we adjusted the data filtering and the visual settings to better highlight the emotional differences between ODU and JMU. First, we raised the minimum_count by raising it so that only frequently mentioned locations appeared, preventing one-off or irrelevant places from distorting the animation. We also filtered the data by place_type, focusing on campus buildings and university-related locations for JMU and neighborhood or road-related locations for ODU to sharpen the contrast between on-campus and off-campus discussion. Visually, we customized the map’s center and zoom to properly frame each campus and its surrounding area, and experimented with multiple color scales before selecting one that made positive and negative sentiment differences more visible over time. After that, we adjusted the size_max so that the bubbles were noticeable without overwhelming the ma,p and so they stayed in their correct locations, and they were easily visible. After working on the visualization,s it further supported our hypothesis. JMU’s sentiment stayed consistently positive around core campus spaces, while ODU’s sentiment fluctuated more and often trended negative around dispersed, off-campus locations.
Our original hypothesis was that ODU’s Reddit focused mainly on off-campus issues, with a higher emphasis on negative statements, while JMU’s Reddit focused a majority of its attention on on-campus places and activities, with an emphasis on positive statements. Through our research, we concluded that our hypothesis was confirmed. We are able to argue this with confidence for many reasons, with our research backing it. One of which is the spatial comparison maps. These maps were able to demonstrate the differences between the two: JMU posts clustered around campus landmarks, while ODU’s posts were scattered more so around Norfolk neighborhoods and nearby roads. Another major supporting factor was sentiment analysis. This was able to effectively demonstrate our predicted emotional differences. Positive scores were associated with on-campus terms like dorms and quad. At the same time, negative scores were associated with off-campus terms like parking and crime for ODU. I would argue that the biggest shortcoming of our analysis was our sample size. I understand that the majority of the time, a bigger sample size is better, but I think that only pulling posts from Reddit could be a major problem. I argue this because at this point, Reddit isn’t necessarily trending, and I would feel comfortable arguing that you’re likely not to get the general public's feelings toward the school from Reddit. I say this because Reddit is more so a niche media platform for college students than say Instagram or TikTok. Having said this, I think our research could have been improved if we had pulled posts from multiple platforms. Another limiting factor was the original geosparsing tool that was used. It consisted of many false positives along with inaccuracies, which forced us to manually clean the document. Not only did this require a lot more work than I’d hoped was necessary, but human error is also a very valid and real thing that has likely occurred throughout our research. I think that it could definitely be proven worth it to find a more reliable and task-specific geosparser that might even recognize region or age-specific slang. Given all of this, our findings have given me a good idea of the implications of spatial sentiment analysis. According to our research, physical environments can be a major factor in shaping the tone of students' posts. For universities, this could be a very useful tool. It could provide universities and many others with valuable information on people's feelings towards certain things. This could allow them to make improvements where they deem necessary.